-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace stop
with return
and provide and bmi_failure
flags for EnergyModule error
#108
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like some changes may need to be made up the call stack from EnergyMain
to avoid weird side effects and consequences.
As I read it, we're expecting a stack like this where the error occurs:
noahowp_update_until
noahowp_update
advance_in_time
solve_noahowp
EnergyMain
When we hit the early return from EnergyMain
, there's still the following call to WaterMain
in solve_noahowp
that'll run, unless error checking and early returns are propagated.
I agree with @PhilMiller that we need to immediately propagate an error up the call stack to avoid side effects of a late return from an error. I don’t see how to avoid conditional statements that check for errors before the end of a subroutine but most of the stop statements come from initialization checks at the beginning of a model run. In that case, conditional statements would have minimal impact on execution speed so the remaining consideration is with cluttering the code with conditional statements. Based on the example in PR #108, I’ve prototyped dealing with upstream error propagation. Logging functionality is included (in log_error() subroutine) such that write statements are contained within a single subroutine. log_error() would replace handle_err() and sys_abort() subroutines. Consistent with NGEN, existing preprocessor directives would control program flow (in driver/NoahModularDriver.f90) and logging status for non-fatal messages (in log_error() . Subroutines that include an error check would need to use ErrorCheckModule. ErrorCheckModule contains three public parameters:
The values of NOM_SUCCESS and NOM_FAILURE are consistent with BMI_SUCCESS and BMI_FAILURE. Impacted files are:
File additions/changes are:
|
Offering my thoughts for consideration, @SnowHydrology First, on the overall approach, the use of an error flag variable seems appropriate (e.g., Second, regarding the error message, I'd advocate for saving the error message and printing when NGEN would print the time step and location information for the error (if possible). Instead, this code (to my current understanding) calls Third, @drakest123, I'm not sure if you meant to add your commits atop those from @SnowHydrology in his |
src/DomainType.f90
Outdated
@@ -31,6 +32,8 @@ module DomainType | |||
integer :: croptype ! crop type | |||
integer :: isltyp ! soil type | |||
integer :: IST ! surface type 1-soil; 2-lake | |||
integer :: error_flag ! flag for energy balance error (0 = no error, 1 = longwave < 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use ErrorCheckModule
to deal with the error messages, we may want to move the error_flag
flag to ErrorCheckModule
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a possible solution although I'd need to also review how domain%error_flag is used apart from how we are using it for the current set of updates. Another possibility is to bind a new error object to the model% object and dissociate error_flag from domain%.
driver/AsciiReadModule.f90
Outdated
|
||
implicit none | ||
|
||
contains | ||
|
||
subroutine open_forcing_file(filename) | ||
subroutine open_forcing_file(filename, error_flag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than explicitly passing error_flag
, could you just make error_flag
a public variable in ErrorCheckModule
and set it in anywhere that uses ErrorCheckModule
?
Along those same lines, could you make a public character string in ErrorCheckModule
and write any error message to that variable when the error is encountered? Then use ErrorCheckModule
in bmi_noahowp.f90
to be able to print the error message there (when the error code has already been passed up the stack)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it would be possible to remove error_flag from some of the intermediate call signatures. That would clean up some of the call signatures at the expense of requiring programming care not to co-opt that variable for other uses.
Passing the error string up the stack would require adding an error string to the domain object (as currently constructed). I agree that would be more complete. That is Keith's call. If we do this, then it would make sense to follow through with your first suggestion about removing error_flag from some call signatures. When passing these to two variables up to BMI, we would need to either pass a single error object or pass both the error code and error string. I'd prefer to avoid binding ErrorCheckModule to BMI.
stop ": ERROR EXIT" | ||
error_flag = NOM_FAILURE | ||
write(error_string,'(A,A,A)') "AsciiReadModule.f90:open_forcing_file(): File: ",trim(filename), " does not exist. Check the forcing file specified as a command-line argument" | ||
call log_message(error_flag, error_string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See previous comment regarding writing to a possible public character variable in ErrorCheckModule
instead of calling log_message
to print right away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, seems like kind of a bummer that we'd have to manually record where in the program the error occurs via the error message. My understanding is that this is because we're going to use return
statements instead of stop
statements in which case the program won't record any locational information (i.e., where in the code the error happens). Is there any way to automatically record locational information to pass along side the error code and message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this code is being run through the C pre-processor (as use of #ifdef NGEN_ACTIVE
suggests), then we could define a macro along the lines of
#define NOAH_OWP_MODULAR_REPORT_ERROR(error_string) \
error_flag = NOM_FAILURE \
call log_error_message(error_flag, error_string, __FILE__, __LINE__)
Those __FILE__
and __LINE__
macros would capture the location of the error report, though not the function name
I'm a touch concerned that these files might not get pre-processed as expected, since that usually seems to go with a .F90
(upper-case) file extension. As I understand it, though, that's just a convention, and the build scripts could do whatever. The shift to a cmake-driven build system may change that, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered the preprocessor approach that @PhilMiller discusses. Besides the issues with this approach that Phil brings up, I don't think that using preprocessor directives for delineating error sources would be appropriate. Currently, the program has sparse reliance on preprocessor directives and they are used for major functional conditions such as whether the code is run standalone or as a module. These few preprocessor directives could easily be removed so the reliance on them is easy to decouple.
One possible intermediate improvement would be to define a string that contains the subroutine name and another string with module scope that contains the module name. Then, the error string would refer to those variables instead of them being hardwired. Given that LINE would unambiguously identify the line where the error is reported, unique error messages alternatively identify error sources although the onus is more on the programmer.
Updated error handling, addressing some of Grey Evenson's suggestions. Ready for review by @SnowHydrology. Changes include:
|
This PR replaces the
stop
statement withreturn
inEnergyModule.f90
. This will allow the NextGen framework to provide basin and timestep information whenBMI_FAILURE
is reported.Changes:
stop
withreturn
and assign error flag value